AITopics | teacher-student pair

Collaborating Authors

teacher-student pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distance OP

Neural Information Processing SystemsApr-29-2026, 23:57:32 GMT

Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these MultiScale handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs.

distillation, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Cognitive Science (0.70)
(3 more...)

Add feedback

KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

Neural Information Processing SystemsFeb-17-2026, 11:35:11 GMT

distillation, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.93)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Cognitive Science (0.71)
(3 more...)

Add feedback

cf5c369c1bc070361477008e3f5210ed-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 00:31:55 GMT

experiment, teacher-student pair, undistillable class, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

TeachLess, LearnMore: OntheUndistillableClassesinKnowledgeDistillation

Neural Information Processing SystemsFeb-12-2026, 00:31:51 GMT

A counter-intuitive observation is that a more expansive teacher does not make a better student, but the reasons for this phenomenon remain unclear.

artificial intelligence, distillation, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

Neural Information Processing SystemsDec-26-2025, 22:45:05 GMT

Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures.

evolving knowledge distiller, kd-zero, name change, (3 more...)

Neural Information Processing Systems

Industry: Education (0.59)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.42)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.42)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Towards Efficient 3D Object Detection with Knowledge Distillation

Neural Information Processing SystemsDec-24-2025, 16:41:05 GMT

Despite substantial progress in 3D object detection, advanced 3D detectors often suffer from heavy computation overheads. To this end, we explore the potential of knowledge distillation (KD) for developing efficient 3D object detectors, focusing on popular pillar-and voxel-based detectors. In the absence of well-developed teacher-student pairs, we first study how to obtain student models with good trade offs between accuracy and efficiency from the perspectives of model compression and input resolution reduction. Then, we build a benchmark to assess existing KD methods developed in the 2D domain for 3D object detection upon six well-constructed teacher-student pairs. Further, we propose an improved KD pipeline incorporating an enhanced logit KD method that performs KD on only a few pivotal positions determined by teacher classification response and a teacher-guided student model initialization to facilitate transferring teacher model's feature extraction ability to students through weight inheritance. Finally, we conduct extensive experiments on the Waymo dataset. Our best performing model achieves $65.75\%$ LEVEL 2 mAPH surpassing its teacher model and requiring only $44\%$ of teacher flops. Our most efficient model runs 51 FPS on an NVIDIA A100, which is $2.2\times$ faster than PointPillar with even higher accuracy.

knowledge distillation, name change, object detection, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.66)

Add feedback

Does Weak-to-strong Generalization Happen under Spurious Correlations?

Liu, Chenruo, Dong, Yijun, Lei, Qi

arXiv.org Machine LearningSep-30-2025

We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $η_\ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction $η_u$. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when $η_u = η_\ell$ but may fail when $η_u \ne η_\ell$, where W2S gain diminishes as $(η_u - η_\ell)^2$ increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

dataset, generalization, spurious correlation, (13 more...)

arXiv.org Machine Learning

2509.24005

Country: Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

A Limitations and Potential Negative Social Impacts

Neural Information Processing SystemsAug-19-2025, 01:41:48 GMT

Our work investigates the "larger teacher, worse student" phenomena in knowledge However, we only discuss image classification. Therefore, we do not guarantee the validity of our observation on other tasks, i.e., object detection, In addition, these classes can be sensitive, i.e., gender We hope future work can completely resolve this issue. Since most of these method provides hyper-parameters for CIFAR100, we do not modify them. In Section 2.2 we use modified ResNet24 as student to perform KD on a ResNet56 teacher model. We have mentioned the existence of the undistillable classes in general to various methods, and Table 1 gives a comprehensive list of methods for which we studied.

artificial intelligence, machine learning, undistillable class, (15 more...)

Neural Information Processing Systems

Industry: Social Sector (0.41)

Technology: